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ABSTRACT 

The purpose of this study was to establish score 
equivalencies between the College Board Scholastic Aptitude Test 
(SAT) and its Spanish- language equivalent, the College Board Prueba 
de Aptitud Academica (PAA) . The method of the study involved two 
phases: the selection of test items equally appropriate for Spanish- 
and English^-speaking students for use in equating the two tests; and 
the equating analysis itself. The method of the first phase was to 
choose two sets of items, one originally appearing in Spanish, the 
other originally appearing in English; to translate each set into the 
other language; and to administer both sets in the appropriate 
language mode for pretest purposes to both types of students. TJiese 
administrations were conducted in the Fall of 1970 with samples of 
condidates taking the PAA or the SAT at regularly scheduled 
administrations. They provided data regarding the difficulty and 
discrimination power of each item for each of the two groups, and an 
index of appropriateness of each item for both groups. On the basis 
of the analyses of these data, two sets of items, one verbal and the 
other mathematical, were chosen and assembled as "common items" to be 
used for equating<s In the second phase of the study, these "common 
items," appearing in Spanish and in English, were administered in the 
appropriate language along with the operational form of the PAA in 
November 1971 and with the operational form of the SAT in January 
1972. The data resulting from the administrations of these "common 
items" were used to calibrate for differences in the abilities of the 
two groups and permitted both linear and equipercentile equating of 
the two tests. Conversion tables are provided. (Author/DB) 
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Preface 



When in the 1960s the College Board created a test in the Spanish language 
modeled on the Scholastic Aptitude Test (sat) the intention was to offer the new 
test as a form of technical assistance to educators in Puerto Rico and other 
Spanish-speaking areas who might wish to use this kind of testing to facilitate the 
transition from school to college. But, as so often happens, the successful de- 
velopment of the Spanish-language Prueba de Aptitud Academica (paa) has cre- 
ated a variety of uses or proposed uses of the test which were not provided for in 
the initial planning. 

Administrators in continental United States institutions have quite reasonably 
asked if they can use the paa as part of the evidence on which to base decisions 
concerning the admission of Spanish-speaking students. In addition, many edu- 
cators who are concerned with improving access to higher education for those 
United States residents and citizens with Spanish backgrounds (Mexican Ameri- 
cans and Puerto Ricans are the largest groups, but there are others) have won- 
dered if aptitude testing in Spanish might not be more appropriate for some of 
these students than the present English language testing is. 

These are complicated questions which will require time and extensive experi- 
mentation to answer. But one of the preliminary steps necessary for beginning to 
deal with them is to develop methods of equating the Spanish-language paa and 
the English-language sat, so that a particular score on one scale will have the 
same educational and psychological meaning as a definite score on the other scale. 

Equating the several English-language forms of the sat to one another has been 
a regular practice for many years and is an important advantage of the Board's 
program. A student may take any form of the test offered throughout the calendar 
year. with confidence that the score reported to a college will not be influenced by 
the date of the test, the competition on that particular date, or any other factors 
except his own ability and, to a slight extent, the inevitable errors of measure- 
ment. At first glance, it would seem a relatively simple step to extend these usual 
equating techniques to the paa-sat relationship. But at second glance, the job 
turns out to be impossible. 

The meaning of equated test forms when both are in the same language is in it- 
self not simple. If two forms of a test are properly equated, any given scaled score 
earned by a student on one form represents the same level of ability or achieve- 
ment as does the same scaled score earned by another student on another form. 
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Similarly, if two forms are properly equated, the score earned by a student on one 
form is most likely the score he would have earned on the other form had he tak^tn 
that oth^r form at exactly the same time and under exactly the same conditions. 
These ideas — ''the same level of ability'' and ''the score he would have earned . . . 
had he taken that other form at exactly the same time'' — are difficult enough to 
think about and to convert into operational terms under ordinary circumstances. 
But when two languages and cultures are involved, there is no way at all to deal 
with the idea of leyel of difficulty with the kind of precision that is ordinarily ex- 
pected of equating procedures. > 

But, as occasionally happens, we can admit the impossibility of a task and still 
undertake to do it as well as we can. The authors in this case have devised an ex- 
tremely ingenious method for equating the p/.a and sat scales in spite of the diffi- 
culties. What they have done has been rather than attempt to force the two tests 
to yield equated scores, to develop instead tables that show what a particular 
score on one of the examinations would be equivalent to on the other examination. 

It is important to emphasize, as the authors do in the text, that the equating 
tables thus provided were developed for one particular population — Puerto Rican 
students in Puerto Rico — and are not known to be accurate for, say, Cubans or 
Colombians or Puerto Ricans in New York. Certainly, the suitability of the paa 
(plus equating tables) for a Chicano student in Los Angeles is a question even fur- 
ther removed from the data of this research. 

It is also important to say that equating the paa to the sat does not confer valid- 
ity upon the paa in circumstances where the sat is known to give useful predic- 
tions of college performance. Each test should be validated by studying its useful- 
ness in each institution where its use is contemplated. 

This equating experiment must not be thought of only from the point of view of 
the SAT and the paa. It in an important step forward in examining the problem of 
cross-cultural testing, and may have useful applications in many other settings 
where students must either study in languages different from the ones spoken in 
their homes, or where there is some other reason to question the equivalence of 
scores across linguistic and cultural distances. 

The educator who must use test scores may use the tables provided here to do 
his work more effectively — but always with the caution indicated by the limita- 
tions of the study. In addition, the student of testing and the connoisseur of equat- 
ing will find this a splendid example of his science and art. 

S. A. Kendrick 

Chief, Division of Research Studies and Services 
College Entrance Examination Board 



Abstract 

The purpose of this study was to establish score equivalencies between the Col- 
lege Board Scholastic Aptitude Test (sat) and its Spanish-language equivalent, the 
College Board Prueba de Aptitud Academica (paa). The method of the study in- 
volved two phases: the selection of test items equally appropriate for Spanish- and 
English-speaking students for use in equating the two tests;- and the equating 
analysis itself. The method of the first phase was to choose two sets of items, one 
originally appearing in Spanish, the other originally appearing in English; to trans- 
late each set into the other language; and to administer both sets in the appropriate 
language mode for pretest purposes to both types of students. These administra- 
tions were 'conducted in the fall of 1 970 \yith samples of candidates taking the vaa 
or the SA'i' at regularly scheduled administrations. They provided data regarding 
the difficulty and discrimination power of each item for each of the two groups, 
and, what was of special interest, an index of appropriateness of each item for 
both groups. 

On the basis of the analyses of these data, two sets of items, one verbal and the 
other mathematical, were chosen and assembled as ''common items" to be used 
for equaling. In the second phase of the study these ''common items," appearing 
in Spanish and also in English, were administered in the appropriate language 
along with the operational form of the paa in November 1971 and with the opera- 
tional form of the sat in January 1972. The data resulting from the administra- 
tions of these "common items" were used to calibrate for differences in the abili- 
ties of the two groups of candidates and permitted both linear and equipercentile 
equating of the two tests. Conversion tables relating the PAA-verbal scores to the 
SAT-verbal scores and the PAA-mathematical scores to the SAT-mathematical 
scores are given in the Appendix (pages 35-37). These conversions represent an 
average of the linear and equipercentile results. Because of the scarcity of data at 
the upper end of the distribution of paa scores, score equivalencies are permissi- 
ble, strictly speaking, only as high as the mid-700s. Score equivalencies beyond 
the mid-700s were obtained by extrapolation. 
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Introduction 

Although the study of cultural differences has been of central interest to educa- 
tional and social psychologists for a long while, attempts to develop a deeper un- 
derstanding of this area have been frustrated by tne absence of a common metric 
by which such comparisons could be made. The reasons for this are obvious. If 
two groups differ from each other in ways that cast doubt on the validity of any 
direct comparisons between them — if, for example, they differ in language, cus- 
toms, and values — then any problem that defies direct comparisons also defies the 
construction of an unbiased metric by which one could hope to make those com- 
parisons. 

The present study represents an attempt to develop a methodology to heip 
make comparisons in the face of these difficulties, and to provide a conversion of 
the verbal and mathematical scores on the Spanish-language Prueba de Aptitud 
Academica (paa) of the College Board to the verbal and mathematical scores, 
respectively, on the College Board English-language Scholastic Aptitude Test 
(sat). Both tests, it is to be noted, are administered to secondary school students 
for admission to college. The paa is typically administered to Puerto Rican stu- 
dents who are planning to attend colleges and universities in Puerto Rico; the sat 
is typically administered to mainland students who are planning to attend colleges 
and universities in the continental United States. It was expected that if conver- 
sion tables between these two tests (and score scales) were made available, direct 
comparisons could be made between subgroups of individuals of the two language- 
cultures who had taken only that test appropriate for them. It was also expected 
that these conversion tables would help in the evaluation of the proDable success 
of Puerto Rican students who were interested in eventually attending colleges on 
the mainland and were submitting paa scores for admission. 

Interest in developing conversions such as these has been expressed in various 
other contexts, usually in the assessment of the outcomes of education for differ- 
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cnt cultural groups living in close proximity — for English- and French-speaking 
students in Canada, for example; for English- and Afrikaans-speaking students in 
South Africa; for speakers of one or another of the many languages in India or in 
Africa, etc. However, no satisfactory methods to satisfy this i Uerest have been 
evident, and the problems attendant on making comparisons among culturally 
different groups are far more obvious and numerous than are the solutions. For 
example, in order to provide a measuring instrument to make these comparisons, 
it is clearly insufficient simply to translate the test constructed for one language- 
group into the language of the other, even with adjustments in the items to con- 
form to the more obvious cultural requirements of the second group. It can hardly 
be expected, without making careful and detailed checks — assuming that such 
checks can logically be made — that the translated items will have the same mean- 
ing and relative difficulty for the second group as they had for the original group 
before translation. 

A method considerably superior to that of simple traaslation has been described 
by Boldt (1969). It requires the selection of a group of individuals who are judged 
to be equally bilingual and bicultural, and the a Iministiation of two tests to each 
individual, one in each of the two languages. js on the two tests are then 
equated as though they were parallel forms of t'r j same test, and a conversion 
table is developed relating scores on each test ' .ores on the other. 

One of the principal difficulties with the foregoing procedure is that the judg- 
ment ''equally bilingual and bicultural" is an extremely difficult, perhaps even 
an impossible, one to make. More than li ly, the group is more proficient, on the 
average, in one of the two langv^ages t^ in the other. This wc jld be especially 
true, of course, if the group is constituieu from a small number of clusters of indi- 
viduals, , 

The present study represents an attempt to overcome such difficulties. In brief, 
it calls for administering the paa to Puerto Rican students and the sat tu mainland 
U.S (continental) students, using a set of common," or anchor, items to calibrate 
and adjust for any differences between the groups in the process of equating the 
two tests. It is noted that these items are ''common'' only in terms of the opera- 
tions used to develop and select them. By the very nature of things they had to be 
administered in Spanish to the Puerto Rican students and in English to the con- 
tinental students. Therefore, to the extent that there is any validity in the notion 
that a set of test items can represent the same psychological task to individuals of 
two different languages and cultures, to the extent that the sense of the operations 
is acceptable, and to the extent that the operations themselves were adequate, the 
study will have achieved its purpose. There is also the concern that the Puerto 
R^can and continental groups appear to differ so greatly in average ability that 
with the limited equating techniques available it is not likely that any set of com- 
mon items, however appropriate, can make adequate adjustments for the differ- 
ences, even if the two tests were designed for students of the same language and 
culture. 

There is, finally, the concern about the generalizability of a conversion betw< en 
tests that are appropriate for different cultural groups. In the usual equating pr Db- 



km, a conversion function is sought that will simply translate scores on one form 
of the test to the score scale on a parallel form of the test — an operation analogous 
to that of translating Fahrenheit units of temperature to centigrade units. How- 
ever, when the two tests in question are measuring different types of abilities, or 
when one or both of the tests may be unequally appropriate for different sub- 
groups of the population, the conversion cannot be unitary, as would be true of the 
temperature-scale conversion, but would be different for different subgroups 
(Angoff, 1966). In the case of the present equating attempt, ii is entirely possible 
that the nsc of different types of subgroups for the equating experiment — Mexi- 
cans and Australians, for example, instead of Puerto Ricans and U.S. continentals 
— would yield conversion functions quite different from those developed in the 
present study. For this reason the conversions developed here should be con- 
sidered as having limited applicability, and should not be used without verification 
with groups of individuals much different from those studied here. 

Method 

The method followed in this study for deriving conversions of scores from the 
verbal and mathematical scales of the paa to the verbal and mathematical scales 
of the SAT consisted of two phases. The first phase entailed the selection of appro- 
priate anchor items for the equating analysis. This phase involved the preparation 
of sets of items in Spanish and in English; the translation of each set into the other 
language; and the administration of both sets in the appropriate language to both 
Spanish- and English-speaking students. On the basis of an item analysis of the 
data resulting from this administration, groups of verba4v^nd mathematical items 
were chosen to fulfill the principal requirement that they be\equally appropriate, 
insofar as this could be determined, for bpth groups of students. Beyond this, the 
usual criteria for the choice of equating items as to difficulty, discrimination, and 
content coverage were adhered to wherever possible. Once the anchor items were 
chosen, the second phase was undertaken, calling for a second test administration 
and an analysis for equating, based on the data resulting from that administration. 

Phase I- Selection of Items for Equating 

In accordance with this plan, 58 Spanish verbal items, 97 English verbal items, 48 
Spanir;h mathematical items, and 52 English mathematical items were chosen 
from the files. (An effort was made to assemble equal numbers of items in Spanish 
and English, but the pool of pretested and usable Spanish items, particularly 
verbal items, did not permit this.) Each item was translated by a small team 
of bilingual experts into the other language, thus making available two complete 
sets of items, 155 verbal and 100 mathematical, each set appearing in both 
languages and, as nearly as was possible by translation, equally meaningful in both 
languages. 

At a later time, all the items ultimately selected as anchor items for equating 
were retranslated independently by different translators back into the original 
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languages. When the original version of each item was compared with the version 
that had undergone two translations — from the original lan^^'uage (Spanish or 
English) to the other language (English or Spanish), and back again to the original 
language — it was found that the two generally compared very well, indicating that 
the translation was adequate and that the original meaning of most of the items 
seemed to have undergone no great change through the course of these two trans- 
lations, a consideration fundamental to the success of this study. 

The 155 verbal items consisted of four types: antonyms, analogies, sentence 
completion, and reading comprehension. The 100 mathematical items were of 
two types: arithmetic and algebraic reasoning problems and problems involving 
geometric concepts. Detailed information on the pretested items is given later in 
this report. 

The 155 verbal items and the 100 mathematical items were each subdivided 
into subsets of items and administered to systematic samples of regular College 
Board examinees. The items appearing in Spanish were taken by candidates for 
the Spanish-language pa a at the November 1970 administration of the pa a; 
the same items, appearing in English, were taken by candidates for the English- 
language SAT at the November 1970 administration of the sat. Five systematic 
samples of Puerto Rican candidates (4 of 305 cases and 1 of 310 cases) were 
formed, each taking 1 of 5 subsets of 31 verbal items in a 25-minute testing 
period. Five additional Puerto Rican samples (4 of 270 cases and 1 of 275 cases) 
were similarly formed, each taking 1 of 5 subsets of 10 mathematical items in a 
25-minute testing period. Correspondingly, 8 systematic 2,000-case samples of 
continental (United States) candidates were formed, each taking 1 of 4 subsets of 
40 verbal items' or 1 of 4 subsets of 25 mathematical items in a 30-minute testing 
period. 

Since the five sets of Spanish verbal items and the five sets of Spanish mathe- 
matical items were administered to different, although very similarly performing, 
groups of Puerto Ricaii students, minor equating adjustments were made in the 
difficulty indexes so that comparisons across the sets of items within each domain 
(verbal and mathematical) could be made directly. The method of adjustment was 
essentially that described by Thurstone (1947). The same types of adjustments 
were made for the items in the four verbal ?,nd four mathematical s^ts adminis- 
tered in English. Once these adjustments were carried out h was possible to pool 
all the verbal items appearing in Spanish into one undifferentiated set and all the 
verbal items appearing in English into a second undiiferentiated set and prepare 
for the next step in the analysis. (Aii the mathematical items in each language 
were similarly pooled into one total set.; This ^^tep, which consisted of an examina- 
tion and comparison of the performance of the Puerto Rican s^ndents with the 
performance of the continental students on the '*same" items , iLvolved a pro- 
cedure which requires detailed description. 

In preparation for making this comparison, the proportion p [.n each of the two 



' In order to permit the formation of four sub3ets of ^0 items each, five "filler'* verbal items were added to 
the 155, making a total of 160 items. 
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language-groups answering each item correctly is calculated and converted to A.- 
A plot is then made of the points represented by the paired A-values, A//, vs. A/,;, 
where ^i^' represents one of the groups and // the other, one point for each of the 
items / under consideration for which A-values are available. The plot of these 
points is normally an ellipse extending from lower left to upper right, and if the 
samples are drawn from the same types of populations, the scatterplot of these 
points is a long, narrow one, often representing a correlation as high as .98 or .99. 
When the samples are somewhat different in level, the points still fall in a long 
narrow ellipse, but it is displaced vertically or horizontally, depending on which 
group is the abler one. Even when the groups differ in dispersion the points still 
fall in the same type of ellipse, but it is tilted at an angle either smaller or larger 
than 45°, depending on which sample is more dispersed. However, when the 
groups differ in type, or when the items do not all have the same meaning for the 
two groups — which may often be the case when the groups are drawn from the 
same general type of population but differ sharply in level or dispersion — the item 
difficulties will not fall in precisely the same rank order for the two groups, and 
the correlation represented by the delta points will be lower than .98 or .99, some- 
times substantially lower. The items falling at some distance from the plot may 
be regarded as contributing to the item-by-group interaction. They are the items 
that are especially more difficult for one group than for the other, relative to the 
other items, and they are the items that appear to represent different "psychologi- 
cal meanings" to the members of the two groups. 

The purpose of the delta plots is to enable the identification of those items that 
do in fact have different meaning for the two groups. The method developed to 
accomplish this involves the determination of the major axis of the ellipse formed 
by the plotted points., and the calculation of the perpendicular distance D, from 
each point to the line. If there were no other consideration in the choice of items, 
the items represented by the smallest Drvalues would be retained; the others 
would be eliminated. 

The equation used for the major axis of the ellipse is a linear one, li = Pg -f- Q, 
where 

and 

(The variables g and // are, respectively, the delta values for the two groups under 
consideration.) The formula for the perpendicular distance Di of each point z in 
the plot to the line is given as: 



2 A = 4<: + 13, where z is a normal deviate corresponding to p; A is inversely related to p, the higher the delta- 
value the more difficult the item. 
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Items were defined as ''equally appropriate" to the Spanish- and English- 
speaking groups on the basis of their proximity to the major axis of the delta plot. 
If the ellipse itself is biased toward one group or the other, then the items chosen 
as ''equally appropriate'' to both groups will also tend to be biased toward that 
group. Thus, in addition to the fact that the item-by-group interaction between 
Spanish and English speakers is far greater, as will be observed below, than would 
be ideal for equating the scales of the tests appropriate for these groups, it should 
be noted that the final conversion may still contain some elements of bias in spite 
of the fact that the method of choosing "equally appropriate" items is intended to 
eliminate the major sources of bias. 

Following the administrations of the items in their Spanish and English forms, 
indexes of item difficulty (deltas) and discrimination (biserial correlations with 
the operational verbal or mathematical score, as appropriate) were calculated 
separately for the two language-groups, A plot was then made of the delta-value 
for each item observed in the paa group vs. the delta-value for that same item ob- 
served in the sat group and a measure of the item-by-group interaction D/ for that 
item (defined in the preceding section as the perpendicular distance of each point 
representing the paired delta-values for each item / from the major axis of the bi- 
variate ellipse) was also calculated. These three indexes formed the principal 
basis for the final selection of the 40 verbal and 25 mathematical items to be used 
as "quasi-common" ("anchor") items for the equating of the two scales. Items 
which were closest to the major axis of the ellipse and which were also within the 
limits set for the difficulty and discrimination indexes were the ones used for this 
purpose. 

Figure 1 gives the delta plot for the 155 verbal items, and Figure 2 gives the 
delta plot for the 100 mathematical items. Points to the right of the major axis in 
each of these ellipses represent items that were more difficult, relative to the other 
items, for the sat group than for the paa group. These items are represented by 
positive D-values. Points to the left of the major axis represent items that are 
more difficult, relative to the other items, for the paa group, and are represented 
by negative D-values. Note (see Figure 1) that 21 of the 155 verbal items were 
harder for the sat group than for the paa group. Fifteen of these items were 
originally in English; the other six were originally in Spanish. The remaining 134 
verbal items were more difficult for the paa group. As may be seen in Figure 2, 
all 100 mathematical items — the 52 originally in English and the 48 originally in 
Spanish — were considerably more difficult for the paa group. 

The plot of verbal items in Figure 1 is far more dispersed about the major axis 
than is the corresponding plot of mathematical items in Figure 2, indicating a 
much lower correlation for v^'.fbal items (.60) than for mathematical items (.85). 
As was pointed out earlier in this report, the correlation between A-values may 
be regarded as a measure of item-by-group interaction. In those instances where 
the two groups are drawn from the same general population it is not unusual to 
find correlations in the neighborhood of .98 and even higher. The fact that these 
correlations, particularly the correlation for the verbal items, are as low as they 
are suggests that the items do not have quite the same psychologic?*! meaning for 




SAT Group 

Figure 1 . Delta Plot for the Pretested Verbal items (Number of i ten::. =^ 155) 

the members of these two language-groups. In a sense this is one of the most 
significant findings in the present study, since it reRccts in the form of statistical 
data the very nature of the psychological difficulties that are likely to be en- 
countered in making cross-cultural studies. With respect to this study in particu- 
lar, it casts some doubt on the quality of any equating that v/ould be carried out 
with these items. Since the equating items are used to calibrate for differences in 
the abilities of the PAA and sat groups, a basic requirement for equating is that 
they have the same rank order of difficulty in the two groups. Considerable im- 
provement, in the sense of reducing the item-by-group interaction, was achieved 
in the group of verbal items, as will be shown below, by discarding the most 
aberrant ones among them. Nevertheless, with item-by-group interaction effects 
as large as those observed here, the concern remains that the equating might be 
much less trustworthy than would expected of an equating of two parallel tests 
intended for members of the same language-culture. 
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0 Items originally in Spanish 
X Items originally in English 
□ Items selected as anchor items for equating 




i§) or % Represents two observations 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 — 

5 6 7 8 9 10 11 12 13 14 15 16 17 18 

SAT Group 

Figure 2. Delta Plot for the Pretested Mathematical Items (Number of I terns = J 00) 

It should be reiterated, however, that these interactions were not entirely un- 
expected; the observation has often been made that verbal material, however 
well it may be translated into another language, loses some of its subtleties in the 
process of translation. Even in the case of mathematical items some shift in the 
order of item difficulty is to be expected, possibly because of differences between 
Puerto Rico and the United States mainland with respect to the organization and 
emphasis of the mathematics curriculum in the early grades. 

Summary statistics — means and standard deviations (SD) — indexes (A, ^bis^ and 
D) that were used as a basis for identification and selection of the equating items 
are given in Table 1. These statistics are presented for all the items pretested in 
this study and also for those finally selected for equating. From the data given in 
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Table 1 it is again clear that these items are far more difficult (and less variable in 
difficulty) for the paa group than for the sat group, and that the difference in 
average difficulty is even more pronounced for the mathematical items than for 
the verbal items. There also appears to be some tendency for the verbal items to 
become slightly more difficult after translation. Evidence of this may be found in 
the column of mean D-values, which shows that the verbal items originally in 
Spanish and translated into English are harder, relative to the other items, for the 
SAT group. The opposite is true for the items originally in English and translated 
into Spanish. These were harder, relative to the other verbal items, for the paa 
group. No such observation is made in the group of mathematical items. 

Generally speaking, the items, especially the mathematical items, have lower 
discrimination values O),},) in Spanish than in English. It is quite possible that, as 
a result of the greater absolute difficulty of the items for the paa group, those stu- 
dents guessed more frequently than did the continental U.S. students, thereby 
introducing more error into the items and depressing the item biserials. It is also 
possible that the items lost in discriminating power as a result of translation. The 
mean biserial for the verbal items appearing originally in Spanish dropped slightly 
when translated into English (.40 to .37). Those that appeared originally in English 
dropped sharply when translated into Spanish (.44 to .30). In the mathematical 
sections the items originally in Spanish gained when translated into English (.45 
to .53); but those that originally appeared in English lost considerably when trans- 
lated into Spanish (.53 to .27). 

The verbal items selected for equating had, for the paa group, a mean biserial 
considerably lower than the mean biserial found for the operational PAA-verbal 
form with which these items were later to be administered for equating (.37 as 
compared with .45) and, typically, for other operational forms of the paa. The 
mean biserials of the selected equating items for the sAT-verbai, SAT-mathemati- 
cal, and PAA-mathematical items, however, are well in line with the values for 
the operational forms (.46 as compared with .47; .56 as compared with .54; and 
.54 as compared with .53; respectively). 

Perhaps most interesting in this cross-cultural study are the sizes of the correla- 
tions represented by the two delta plots. In the entire group of 155 verbal items 
the correlation between deltas was only .60, suggesting that as a group these items 
did not represent the same meaning to the two groups of students. It is noted, 
however, that the items chosen for equating were much superior in this regard. 
They represent a correlation of .87 for the same set of data. This comes as no sur- 
prise, of course, since only those items with /^-values closest to the major axis of 
the pJot, 'vithin the limits of ±0.8, were chosen for equating. When this correla- 
tion was r^^calculated in a pair of independent samples (the ones later chosen for 
use in equating the verbal tests), it dropped, as expected, to .8 1 . 

As expected, the correlation between the deltas for the 100 mathematical items 
was much higher than for the 155 verbal items: -85 as against .60. After selection, 
the correlation rose to .97, based on the same set of data. This rise in the correla- 
tion between deltas was also expected in the mathematical test, since the items 
selected for equating were those that had D- values within the limits ±0.5. How- 
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ever, when this correlation was recalculated in a pair of independent samples 
(those later chosen for use in equating the mathematical tests), it dropped to .80. 
Although a drop in correlation of some magnitude was expected here too, it was 
not anticipated that the drop would be so severe. An examination of this delta 
plot revealed that four items were markedly (but unexplainably) aberrant, two of 
them extremely so. The removal of these two items raised the correlation to ,86; 
the removal of all four raised it to .94. 

Table 2 is a summary of the same data as shown in Table 1, but classified by 
item type rather than by language of origin. The greater difficulty of the items 
for the PAA group is readily observable in this table, as is the smaller dispersion of 
item deltas, not only over the entire test, but also separately by item type. It is 
also clear that the items in all four of the verbal and in both of the mathematical 
item types are more discriminating for the continental students than for the Puerto 
Rican students. 

Although the item groups, and the item analysis samples they were based on, 
are too small to permit easy generalization, it appears that there is considerable 
and, very likely, significant group-by-item-type interaction, that is to say, varia- 
tion from one verbal item type to another with respect to th'^^ir average departure 
Di from the ''equal appropriateness" line (major axis). (No such interaction is ob- 
served in the mathematical items.) The analogy items especially, and to some 
extent the reading-comprehension items, were more difficult, relative to the other 
items, for the Piierto Ricar students than for the continental students. The anto- 
nyms and sentence completion items, on the other hand, were relatively less diffi- 
cult for the Puerto Rican than for the continental students. This appears to be a 

TABLE 3 Distribution of Pretested Items, by Item Type and Language of Origin 



Verbai 





Originally 
English 


Originally 
Spanish 


Totals 


Antonyms 


24 


15 


39 


Analogies 


20 


16 


36 


Sentence Completion 


23 ■ 


17 


40 


Reading Comprehension 


30 


10 


40 




97 


58 


155 




r = 3.852; c/f=3 .30 > P > .20 






Mathematical 




Originally 
English 


Originally 
Spanish 


Totals 


Regular Mathematics 


37 


33 


70 


Geometry 


15 


15 


30 




52 


48 


100 



X- = .069:c//= 1 .80>P>.70 
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subtle effect, very likely characteristic of the item type itself. It is certainly not a 
function of the origin of these items and their increased relative difficulty upon 
translation into the other language. As Table 3 shows, very nearly the same pro- 
portion of items for each of the item types was drawn from each of the languages. 

It is interesting that the four verbal item types arrange themselves into two 
distinct classes insofar as the correlations between their deltas are concerned, the 
higher correlations (smaller item-by-group interactions) characteristic of the 
sentence-completion and reading-comprehension plots, and the lower correla- 
tions (larger item-by-group interactions) characteristic of the antonyms and anal- 
ogies plots. This result is intuitively reasonable since items with more context 
tend to retain their meaning, even in the face of translation into another language. 

In spite of their resistance to the effects of translation, none of the reading 
comprehension items was used for the verbal equating test. The reason for this is 
that the reading comprehension items are not discrete like the others, but are 
interrelated in groups of five, each group based on a single reading passage. Al- 
though one or another of the items within a group may have passed the require- 
ments for use in the equating test, other items in the group did not; and in each 
group there were enough unusable items to render the whole group unusable. 

Phase II- Equating 

Once the 40 verbal and 25 mathematical items that were to be used as "common" 
— more properly, "quasi-common" — items were chosen, preparations were made 
to administer them to groups of candidates taking the paa or the sat for admission 
to college. Accordingly, two samples of candidates were chosen from the No- 
vember 1971 administration of the paa, one to take the verbal items in Spanish, 
the other to take the mathematical items in Spanish, in addition to the regular 
operational form of the paa given at that time. Similarly, two samples of candi- 
dates were chosen from the January 1972 administration of the sat, one to take 
the verbal items in English, the other to take the mathematical items in Eng- 
lish, in addition to the regular operational form of the sat given at that time. The 
PAA samples were chosen to represent groups somewhat higher scoring (mean 
verbal score = 494; mean mathematical score == 488) than the general 1971-72 
PAA candidate group (mean verbal score — 478; mean mathematical score = 
484); the sat samples, drawn systematically from the January 1972 administra- 
tion, were somewhat lower scoring (mean verbal score == 424; mean mathematical 
score = 473) than the general 1971-72 sat candidate group (mean verbal score = 
450; mean mathematical score = 482). Arrangements were made to administer 
the equating items in a separately timed 30-minute period to all candidates (both 
Puerto Rican and continental) taking those items. 

The method of equating involved first the determination of the conversion be- 
tween the raw [R — (lV/4)] scores on the operational form of the paa given in 
November 1971 and the raw [R — (WI4)] scores on the operational form of the 
SAT given in January 1972. The second step called for substituting into that con- 
version relationship the equation between raw and scaled scores for each of the 
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two operational forms. The result of this work would be the score-to-score rela- 
tionship between scaled scores in the two testing programs. 

Two types of equating were undertaken, linear equating and curvilinear 
(equipercentile) equating, and within the general linear model two methods were 
used. The first of these two linear methods, following a procedure described by 
Tucker (in Angoff, 1971, p. 580), required an estimation of the mean and variance 
for the combined paa-sat samples on each of the tests, as follows: 

Mu, = Mufi + Ihjr^ {Mr, - Mrf,\ (2) 
-'\.^^^rJs^,-'V^ ' (3) 



and 



where x or X ^ the test (paa) taken by group a (Puerto Rican); v or Y = the test 
(sat) taken by group /3 (continental U.S„); = the score on the ''common items," 
i.e., scores on the items taken in Spanish by the Puerto Rican candidates and 
scores on the ''same" items taken in English by the continental U.S. candidates; 
and the combined group a-f-^. 

The notation bj-,- represents the usual coefficient of regression of variable .v on 
variable v: h,rr = rrrsjsr. (Similarly, = 'V^^/z/'^r.) The estimated values M.r,, 
Mj/f, s,rf, and 5,/^ are then substituted in the equation 

= — ' (5) 

to yield the equation 

Y=aX'^h. ■ (6) 

converting the scale of the raw scores on test X to the scale of the raw scores on 
test K. In this equation a = Si/Jsj', and h^^My^ — ciMx,, 

The second type of linear equating, due to Levine (1955), is based on the con- 
versions of true rather than observed scores, and is applicable wi^en the tests 
(X and Y) to be equated are unequally reliable. In this procedure, when the equat- 
ing test V is exclusive and experimentally independent of ^ and K, the slope a' 
and intercept b' of the conversion equation 

Y=^a'X'^b' (7) 
are calculated as follows: 

a'=n,,riilnrr,, (8) 

and 

b'=Mu,-a'M,rr (9) 
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where 



(10) 

(II) 



and where, In general, tnj is the ratio of effective test length of test / to test J 
(Angoff, 1953): 



(12) 



The derivation of the conversion from the PAA-verbal reporting scale to the 
SA i -verbal reporting scale is developed as follows. The linear equation 



(13) 



is the equation by which raw scores on the form of the paa used in November 
1971 (X) are converted to the paa reporting scale (Sp), Similarly, the linear 
equation 



Sr = A'Y-\-B' 



(14) 



is the equation by which raw scores on the form of the sat used in January 1972 
(Y) are converted to the sat scale (Sc). Expressing equation (13) in terms of X 
[X = (Sp -B)l A] and equation ( 1 4) in terms of y [K = (S,. - fi ')//!'], and substitut- 
ing in equation (6) results in the equation 



S,-B' 
A' 



Su-B 



which, when simplified, becomes 



A A 



(15) 



Equation (15) is a linear equation with slope equal to aA'lA and intercept equal to 
A'h + B' — (ciA' BjA), and may be used to convert verbal or mathematical scores 
from the November 1971 converted-score scale for the paa to corresponding 
scores on the sat scale. 

The curvilinear, or equipercentile, equating between raw scores on the paa and 
the sat followed a procedure described by Angoff (1971, p. 583) involving the 
following steps: (1) equating the scores on the operational form of the paa to the 
scores on the Spanish version of the "common items''; (2) equating the scores on 
the operational form of the sat to the scores on the English version of the "com- 
mon items''; and (3) setting equivalent scores on the paa and sat that were found 
to be equivalent to the same "common item" scores. 
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TABLE 4 Frequency D^^stributions and Summary Statistics for the Operational and Equating 
Sections of tlie SAT and PAA 



Verbal Tests 



Continental Sample Puerto Rican Sample 

Raw 

(Formula) Operational Equating Operational Equating 



Score 


SAT 


Section 


PAA 


Section 


84 - 


86 


2 








81 - 


83 


3 








78 - 


80 


8 








75 - 


77 


10 








72 - 


74 


22 








69 - 


71 


30 








66 - 


68 


45 




1 




63 - 


65 


53 




4 




60 - 


62 


63 




17 




57 - 


59 


99 




24 




54 - 


56 


106 




53 




51 " 


53 


129 




66 




48 - 


50 


131 




72 




45 " 


47 


180 




72 




42 - 


44 


180 




84 




39 - 


41 


205 


13 


99 




36 - 


38 


223 


67 


72 


5 


33 - 


35 


250 


177 


103 


21 


30 - 


32 




oco 
^0 o 




o4 


27 - 


29 


265 


375 


106 


56 


24 - 


26 


238 


427 


99 


92 


21 - 


23 


206 


428 


91 


93 


18 - 


20 


247 


495 


94 


165 


15 - 


17 


187 


423 


62 


122 


12 - 


14 


163 


365 


61 


149 


9 - 


11 


148 


321 


61 


168 


6 - 


8 


112 


195 


19 


163 


3 - 


5 


110 


148 


17 


184 


0 - 


2 


73 


73 


10 


77 


-3 - 


-1 


36 


22 


3 


41 


-6 - 


-4 


19 


6 


1 


15 


-9 - 


-7 


4 








-12 - 


-10 


3 








Number of Cases 


3,798 


3,798 


1.385 


1.385 


Mean 




31.4334 


19.4479 


32,1047 


13.2448 


SD 




17.3455 


8.7787 


14.2739 


8.8839 



Correlation: 

Operational vs. Equating .8833 .8488 

Number of Items 90 40 70 40 



noi e: The opennional i»AA-verbal and SAT-verbal lesls were 70 and 90 items respectively. The 
verbal equating test, administered to both the paa and sat groups (in their own lang '.age mode), 
consisted of 40 items. 



Linear Equating 

Jn order to calculate the estimated values (for the verbal tests) for the Tucker 
equating given in equations (0 through (4) and the values for the Levine equating 
given in equations (8) through (11), the correlations between the operational test 
and the 40-item equating section,'^ as well as the related means and standard 
deviations, were prepar.:jd for each of the two verbal samples, one consisting of 
3,798 continental students and the other consisting of 1,385 Puerto Rican stu- 
dents. These statistics, accompanying the frequency distributions of the opera- 
tional and equating verbal tests, are given in Table 4. 

The data of T Die 4 make it clear that, to the extent that the ''common items" 
are in fact appropriate for both groups of examinees, the continental sample is the 
higher-scoring of the two, by about 0.7 standard deviations. Additional observa- 
tions may be made regarding the operational tests: The 70-iteni paa appears to be 
only slightly too difficult, on the average, for the Puerto Rican sample; the average 
percentage-pass on that test (corrected for guessing) was .46. The 90-item sat is 
clearly difficult for the continental sample; the average percentage-pass on that 
test (also corrected for guessing) was .35. These observations are confirmed by 
the shapes of the distributions in Table 4, which, except for the distribution of the 
equating-section scores for the continental sample, are all positively skewed. 

The patterns of standard deviations and correlations observed in Table 4 be- 
tween the equating test in English and the sat and between the equating test in 
Spanish and the paa suggest that each of these verbal equating tests is virtually 
parallel in function to the operational test with which it is paired. 

The application of the statistics in Table 4 to equations (1) through (4) and to 
equations (8) through (11) resulted in the following values:^ 

Tucker Method Levine Method 



From these values the following equations, permitting the conversion of scores 
from the raw-score scale of the PAA-verbal test (X) to the raw-score scale of the 
SAT-verbal test f K), were determined under the Tucker and Levine methods: 



Recall that 1 15 ve/bal ''outlier*' items were removed from the original group of 155 items administered to 
both the continental and the Puerto Rican examinees. 

^ Since the paa and the sat must be regarded as appropriate only for the cultural group for which each was 
separately designed, each "combined group" value must be interpreted as an estimate of the performance of the 
combined group assuming that the test in question was appropriate for all members of the combined group. The 
Puerto Rican and mainland samples were weighted about equally m making these calculations. 



yv/,.^ = 36.1445 
M,, -25.7771 
14.8306 
A, -18.2502 



M.;-^-37.0490 
Mv, =24.7645 

1.6691 
z/,,^- 2.0578 



y=: 1.2306A'- 18.7017 (Tucker), 



(16) 
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and 

r= 1.2329,Y-20.9136 (Levine). (17) 

In order to derive the numerical conversion from the PAA-verbal reporting scale 
to the SAT-verbal reporting scale under the Tucker method, the following numeri- 
cal values for the slopes and intercepts of equations (6), (13), and (14) were ap- 
plied to the constants in equation (15): 

(/ =1.2306, h =-18.7017 [from equation (6) by Tucker method], 

A =7.1424. B =264.1965 [from equation (1 3)1, 

and 

A' = 6.3075. B' = 225.4387 [from equation (14)]. 

The resulting scale-to-scale conversion for the verbal test, derived by the Tucker 
method of linear equating, is, therefore, 

Sr = 1.0868 179.6381. (18) 

The numerical conversions from the PAA-verbal reporting scale to the sat- 
verbal reporting scale under the Levine method were obtained from equations (7), 
(13), and (14), using the following conversion parameters in a relationship pre- 
cisely equivalent to that shown in equation (15), except that a' is applied instead 
of and b' instead of b: 

a' =\ .2329, b' = -20.9 1 36 [from equation (7) by Levine method], 

A =7.1424, B =264.1965 [from equation (13)], 

and 

A ' = 6.3075, B' = 225.4387 [from equation (14)]. 

The resulting scale-to-scale conversion for the verbal test derived by the Levine 
method of linear equating now becomes: 

5,= 1.08885;>- 194.1262. (19) 

Comparison of the Tucker and Levine conversions for the verbal tests shows a 
constant difference of about 13 points throughout the range of scaled scores, with 
the Levine conversions yielding lower sat equivalents. This result is predictable 
from the fact that the paa group had a lower mean on 'tJ?e equating items than did 
the SAT group. 

Because the data of this study failed to satisfy the assumptions of either the 
Tucker or the Levine equating methods entirely, the final lirear conversion equa- 
tion for transforming the PAA-verbal scale Sp to the sat- verbal scale Sc was taken 
to be the bisector of the two lines given, respectively, in equations (18) and (19). 
The equation of the bisector is: 
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5.= 



1.0878 5/,- 



186.8779, 



(20) 



from which the following equivalencies were determined: 



PAA-^V 
Score 



Equivalent 
SA T~V 
Score ( Linear) 



800 
700 
600 
500 
400 
300 
200 



683 
575 
466 
357 

248 
(139)^' 
(31>^ 



A JTiore detailed linear conversion table for the verbal tests is provided in Table 
I of the Appendix (page 35). However, it is clear from the foregoing list of 
equivalencies (assuming a linear model for equating) that the difference between 
the two scales is in the vicinity of 140-145 points at a pa a score of 500. The differ- 
ences are larger, however, at the lower end of the scale and become progressively 
smaller at the higher score levels. 

Directly parallel procedures were followed in deriving the equation for con- 
verting scaled scores on the PAA-mathematicai sections to scaled scores on the 
SAT-mathematical :^=^ctions. In order to calculate the estimated values given in 
equations (1) through (4) and in equations (8) through (1 1) for the mathematical 
tests, the correlations between the operational test and the 25-item equating sec- 
tion,^^ as well as the related means and standard deviations, were prepared for 
each of the two mathematical samples, one consisting of 3,867 continental stu- 
dents and the other of 1,060 Puerto Rican students. These statistics are given in 
Table 5 along with the frequency distributions of the mathematical operational 
and equating tests. 

The mathematical equating data in Table 5 reveal even more sharply than do the 
verbal equating data in Table 4 that the continental sample is the higher scoring of 
the two. The mean difference in the mathematical "common items" is about 1.5 
standard deviations. Also, note that as in the case of the verbal test, the opera- 
tional PAA-mathematice* test was more appropriate in difficulty for the paa sample 
(percentage-pass, corrected for guessing = .44) than was the SAT-mathematical 
test for the sat sample (percentage-pass, corrected for guessing — .38). The dis- 
tributions in Table 5, which show moderate positive skewness on the sat for the 
continental sample, confirm this observation. The most striking observations to be 
made from Table 5, however, are the extreme negative skew in the distribution of 



^ Scores lower than 200 on both the sat and the paa are reported as 200. 

« Recall that 75 mathematical **outlief' items were removed from the original group of 100 items administered 
to both the continental and the Puerto Rican examinees. 
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TABLE 5 Frequency Distributions and Summary Statistics for the Operational and Equating 
Sections of the SAT and PAA 



Mathematical Tests 



Continental Sample Puerto Rican Sample 

Raw 

(Fornnula) Operational Equating Operational Equating 



Score 


SAT 


Section 


PAA 


Section 


58 - 


59 


4 








56 - 


57 


11 








54 - 


55 


11 




9 




52 - 


53 


21 




12 




50 - 


51 


30 




20 




48 - 


49 


55 




25 . 




46 - 


47 


49 




21 




44 - 


45 


71 




45 




42 - 


43 


77 




30 




40 - 


41 


99 




36 




38 - 


39 


118 




49 




36 - 


37 


136 




24 




34 - 


35 


163 




51 




32 - 


33 


160 




44 




30 - 


31 


200 




49 




28 - 


29 


197 




59 




26 - 


27 


202 




47 




24 - 


25 


224 


882 


58 


20 


22 - 


23 


197 


499 


54 


21 


20 - 


21 


220 


562 


61 


46 


18 - 


19 


220 


453 


72 


49 


16 - 


17 


loi 




A'i 
4o 




14 - 


15 


182 


272 


57 


55 


12 - 


13 


176 


208 


56 


56 


10 - 


11 


166 


179 


67 


68 


8 - 


9 


176 


157 


29 


82 


6 - 


7 


148 


118 


12 


86 


4 - 


5 


120 


105 


14 


131 


2- 


3 


93 


76 


6 


104 


0 - 


1 


87 


63 


6 


154 


-2 - 


-1 


' 53 


41 


4 


96 


-4 - 


-3 


22 


8 




34 


-6 - 


-5 


13 


4 




13 


-8 - 


-7 


4 








-10 - 


-9 


1 








Nunnber of Cases 


3.867 


3,867 


1,060 


1,060 


Mean 




22.6499 


17.6025 


26.4066 


7.1868 


SD 




13.1246 


6,8164 


12.7172 


7.3849 



Correlation: 

Operational vs. Equating .8206 .8781 

Number of items 60 25 55 25 

note: The operational pAA-mathematical and SAT-mathematical tests contained 55 and 60 items 
respectively. T he mathematical equating test, administered to both the paa and sat groups (in 
their own language mode), consisted oF25 items. 



equating test scores for the continental sample and the positive skew on that test 
for the Puerto Rican sample^ again pointing to the vast difference in difficulty of 
the ^'common'" mathematics items for these two groups. 

As was true of the verbal data in Table 4, the patterns of standard deviations 
and correlations in Table 5 between the equating test in English and the sat and 
between the equating test in Spanish and the paa suggest that each of these 
mathematical equating tests is virtually parallel in function to the operational test 
with which it is paired. 

The application of the statistics in Table 5 to equations (1) through (4) and to 
equations (8) through (1 1) results in the following values:" 

Tucker Method Levine Method 



M,r, = 35.0494 Mr, = 36.5940 

M,,,= 15.2236 13.0176 

.V.;., = 14.5954 1.7824 

=15.7612 = 2.0494 

These values were then applied to yield the equations for the mathematical tests 
(corresponding to those for the verbal tests in equations (16) and (17) above), as 
follows: 

1.0799A'- 22.6253 (Tucker), (21) 

and 

y== 1.1498A'~ 29.0573 (Levine), (22) 

permitting the conversion of scores from the raw-score scale of the PAA-mathe- 
matical test to the raw-score scale of the SAT-mathematical test. 

In order to derive the Tucker conversion from the PAA-mathematical reporting 
scale to the sa i -mathematical reporting scale, the foliowing numerical values from 
the slopes and intercepts of equations (6), (13), and (14) were applied to the con- 
stants in equation (15): 

a = 1.0799, b =—22.6253 [from equation (6) by Tucker method], 

A =8.336U B =268.2090 [from equation ( 13)), 

and 

A'^ 8.5584, B* = 279.201 3 [from equation (14)]. 

The resulting conversion for the mathematical test under the Tucker method is, 
therefore, 



'Since the v\\ and the sat must be regarded as appropriate only for the cultural group for which each was 
separately designed, each "combined group" value must be interpreted as an estimate of the performance of the 
combined group assuming that the test in question was appropriate for all members of the combined group. The 
Puerto Rican and continental samples were weighted about equally in making these calculations. 
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5.= 1.1087 5',,-211.7978. (23) 

Similarly, the scale-to-scale conversion of the mathematical test obtained under 
the Levine method by applying the results of equation (22) (namely, a' = 1.1498 
and = —29.0573) and the scaled-score parameters A, B, A \ and B' m the pre- 
ceding paragraph to an equation precisely parallel to equation (15) is as follows: 

Sr= 1.1805 5;, -286.0932. (24) 

Comparison of the Tucker and Levine conversions for the mathematical tests 
reveals substantial differences, ranging from 60 points at a i»aa score of 200 to 17 
points at a paa score of 800, with (as in the verbal conversions) the Levine con- 
versions yielding the lower sat equivalents. As with the verbal conversions, but 
to a much more pronounced degree, the difference between the two conversions 
is predictable from the fact that the paa group had a lower mean on the equating 
items than the sat group did. 

As with the verbal conversions, the bisector of the two lines represented by 
equations (23) and (24) was used as the final conversion line for transforming the 
PAA-mathematical scale S^, to the SAT-mathematical scale Sr* Its equation is; 

5'«.= 1.1440 5'„ -r48.2851, (25) 

from which the following equivalencies were determined: 

EquivaleiU 
FAA'M SAT-M 
Score Score (Linear) 



800 667 

700 553 

600 438 

500 324 

400 209 

300 (95)« 

200 (-18)« 

As do the verbal equivalencies, these (linear) equivalencies for the mathemati- 
cal tests show striking differences between the paa and sat scales. In the vicinity 
of a PAA score of 500 there is a difference of 175-180 points. However, as with 
the verbal equivalencies, the differences are larger at the lower end of the scale 
and become progressively smaller at b'gher score levels. 

A more detailed linear conversion table for the mathematical tests is provided 
in Table I of the Appendix (page 35). 



** Scores lower than 200 on both the faa and sat are reported as 200. 



Curvilinear Equating 

The curvilinear (equipercentile) method of equating, outlined above, was also 
used to determine the equivalent raw scores on paa and sat tests. These equiva- 
lent scores were obtained by first equating raw scores on test X (administered to 
the PAA group) tc raw scores on the common test V (also given to the same paa 
group) by setting equal scores at the same percentile rank on the distributions for 
tests X and K Similarly, raw scores at the same percentile ranks on the distribu- 
tions for tests Y and V (taken by the sat group) were also equated. Then, for each 
score on test V, the equivalent scores on tests X and Y were found, plotted, and 
smoothed to yield a conversion from X to Y, 

The equated raw scores on tests A" and Y were then converted to their corre- 
sponding scaled scores from equations (13) and (14): 

5p=/4.V-f5, (13) 

and 

S, = A'Y'^B\ (14) 



These equations, for converting raw scores to scaled scores, are restated in 
numerical terms as follows: 

Verbal 

paa: /4 =7.1424 B = 264.1965 
sat: a ' = 6.3075 = 225.4387 

Mathematical 

paa: a =8>3361 B =268.2090 
sat: /i' = 8.5584 5' = 279.2013 



from which the following equivalencies were determined: 



Eqtrhmlent 
PAA'V SaT'V 
Score Score (Curvilinear) 



800 

700 628 

600 449 

500 342 

400 260 

300 200 
200 



Equivalent 
PAA'M SAT-M 
Score Score ( Curvilinear) 



800 

700 519 

600 386 

500 313 

400 266 

300 215 
200 



The curvilinear equivalencies tell essentially the same story as do the linear 



" Score equivalencies above the mid-700s on the paa are unavailable because of the scarcity of data in the 
upper region of the distribution of i»aa scores. 
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equivalencies, that there is a wide difference between the pAAand sat scales. In 
these equivalencies there /s about a 155-160 point difference in verbal scores and 
about a 185-190 point difference in mathe^iiatical scores at a paa score of 500. 
Detailed curvilinear conversion tables are provided in Table II of the Appendix 
(page 36). 

The final conversions between the paa and sat scales chosen for operational 
use are the averages of the linear and curvilinear equatings, one for the verbal 
tests, the other for the mathematical tests. The detailed equivalency tables appear 
in Table III of the Appendix (page 37). A summary of these equivalencies 
follows: 

Eqiiivtilent Equivalent 
PAA'V SAT-y PAA'M SAT-M 

Score Score {Average} Score Score (Averai^e) 



800 767^'' 800 

700 602 700 536 

600 458 600 412 

500 350 500 319 

400 254 400 238 



Graphs of the linear and curvilinear equatings, as well as the final conversions 
(which are the averages of the two), appear in Figures 3 and 4. 

The essentials of the relationships between the score scales for the PAAand the 
sat that are observed in these final equivalency tables have already been de- 
scribed in connection with the linear and equipercentile results presented earlier 
in this report. The tables indicate that a paa midscale value (500) is equivalent to 
an SAT-verbal score substantially below midvalue (350), and an even lower (3 19) 
SAT-mathematical score. 

Some attention should be given to the meaning of the differences in these scales. 
The fact that a 500 score on the paa corresponds to a lower-than-500 score on the 
SAT simply says that if one can assume that the sat and paa values have been 
maintained precisely since the time of their inception, it can be concluded that the 
original scaling group for the sat was generally more able in the abilities measured 
by these aptitude tests than the original scaling group for the paa. It does not by 
itself imply that the sat candidate group today is necessarily more able than the 
PAA group, although this is in fact the case; the 1971-72 paa candidate group 
earned mean scores on their own scale of 478 on the verbal test and 484 on the 
mathematical test, which convert, respectively, to about 328 and 304 on the sat 
scale. The 1971-72 sat candidate group earned considerably higher mean scores 
— 450 on the verbal test and 482 on the mathematical test. Nor does it necessarily 
suggest any generalization regarding the larger populations from which these two 
examinee groups were self-selected — for example, that the twelfth-grade students 



Extrapolated value. 
" Further extrapolation was not possible in this region. 
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Figure 3. Linear, Curvili/iear, and Final (A vera^^e) Conversions for the Verbal Tests 

on the mainland are higher scoring than the twelfth-grade students in Puerto Rico. 
We know, for example, that the sat examinee group represents about one-third of 
the twelfth-grade population on the mainland and is therefore a more selective 
group than its paa counterpart, which represents a larger proportion, about two- 
thirds, of the twelfth-grade population in Puerto Rico. On the other hand, this is 
not to say that differences between the two twelfth-grade populations do not also 
exist. There is some evidence, however Crude, that marked differences do exist. 
But this evidence is outside the scope of the present study. 
In view of these and other possible misinterpretations of the data of this study, 
^ it will be useful to restate the limited purpose for which the present investigation 
was undertaken: to derive a set of conversions between two similar-appearing 
scales of measurement, one for tests of one language and culture, the other for 
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800 - 
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PAA Scaled Scores 

Figure 4. Linear, CurvUineciry and Final (A venige ) Conversions for the 
Mathematical tests 

tests of a different language and culture. Clearly, the accuracy of these conver- 
sions is limited by the appropriateness of the method used to derive them and the 
data assembled during the course of the study. It is hoped that these conversions 
will be useful in a variety of contexts but (as suggested by the examples cited here) 
in order to be useful they will need in each instance to be supported by additional 
data peculiar to the context. 

Summary and Discussion 

The purpose of this study was to establish score equivalencies between the Col- 
lege Board Scholastic Aptitude Test (sat) and its Spanish-language equivalent, the 
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College Board Prueba de Aptitud Academica (paa). The method of the study in- 
volved two phases: the selection of test items equally appropriate and useful for 
Spanish- and English-speaking students for use in the equating of the two tests; and 
the equating analysis itself. The method of the first phase was to choose two sets 
of items, one originally appearing in Spanish, the other originally appearing in 
English; to translate each set into the other language; and to administer both sets 
in the appropriate language mode for pretest purposes to both types of students. 
These administrations were conducted in the fall of 1970 with samples of candi- 
dates taking the paa or the sat at regularly scheduled administrations. They pro- 
vided data regarding the difficulty and discrimination power of each item for each 
of the two groups and, what was of special interests an index of the appropriate- 
ness of each item for each group. 

On the basis of the analyses of these data, two sets of items, one verbal and the 
other mathematical, were chosen and assembled as ''common items" to be used 
for equating, in the second phase of the study these ''common items," appearing 
both in Spanish and in English, were administered in the appropriate language 
along with the operational form of the paa in November 1971 and with the opera- 
tional form of the sat in January 1972. The data resulting from the administra- 
tions of these "common items" were used to calibrate for differences in the abili- 
ties of the two groups of candidates, and permitted both linear and curvilinear 
(equipercentile) equating of the two tests. Conversion tables relating the paa- 
verbal scores to the SAT-verbal scores and the PAA-mathematical scores to the 
SAT-mathematical scores are given in the Appendix (page 37). These conver- 
sions represent an average of the linear and equipercentile results. Because of the 
scarcity of data at the upper end of the distribution of paa scores, score equiva- 
lencies are permissible, strictly speaking, only as high as the mid-700s. Score 
equivalencies beyond the mid-700s were obtained by extrapolation. 

The procedure followed in conducting this study requires special discussion, 
perhaps all the more because it is, at least superficially, a simple one both in its 
conception and in its execution. On the other hand, from a psychological view- 
point the task of making cross-cultural comparisons of the kind made here is 
highly complex. In the extreme the task is inescapably an impossible one, and 
although the present study may represent a reasonably successful attempt, it 
should be remembered that the cultural differences confronted by the present 
study were minimal and relatively easily bridged. If, for example, the two cultures 
under consideration were very different, then there would be little or no common 
basis for comparison. 

Given, then, that the cultures out of which the tests in the present study were 
developed are to some extent similar, and that there is indeed a basis for com- 
parison, the approach and method offered in this study do appear to have some 
likelihood of success. Indeed, the method itself is useful not only in providing a 
type of metric for utilizing the common basis for comparison, but also in providing 
a basis for evaluating the degree to which there is a common basis for comparison. 
For example, it allows a comparison of the two cultures only on a common ground, 
which is to say only on those items that are relatively close to the major axis of 
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the ellipse. This being the case, those characteristics of the two cultures that iTiake 
them uniquely ditFerent are in essence removed from consideration in making the 
comparisons. Thus, while we are afforded an opportunity to compare the two cul- 
tures on a common basis — i.e., on the items that are ^'equally appropriate" — at the 
same time we are afforded an opportunity to examine the differences in the two 
cultures in the terms provided by the divergent or "unequally appropriate" items. 
It is noteworthy that what emerges out of this study — and other studies that have 
also made use of the delta-plot technique, e.g., Angoff and Ford (1971) — is that 
the method described here also yields a general measure of cultural similarity, ex- 
pressed in the size of the correlation represented by the delta plots. The correla- 
tion (or statistics derived from the correlation, such as the standard deviation of 
the D-values) summarizes the degree to which members of the two cultures per- 
ceive the item stimuli similarly. Additional studies of the similarity of any two 
cultures would have to be based on other stimuli examined in a wide variety of 
different social contexts. 

It should also be made clear that the method has its limitations, as do the results 
of this study which has followed the method. For example, the present study has 
leaned on the usefulness of translations from each of the two languages to the 
other, and the assumption has been made that biases in translation, if they exist, 
tend to balance out. This assumption may not be a tenable one, however. Quite 
possibly translation may be easier and freer of bias when going from language A 
to language B than in the reverse direction; and if items do become somewhat 
more difficult in an absolute sense as a result of translation, this effect would be 
more keenly felt by speakers of language A than of language B. The result of this 
effect is that the central tendency of the elliptical plot of common items would 
experience a net bias. Also, implicit in the method of this study is the assumption 
that language mirrors all the significant cultural effects. This may not be so, and it 
is possible that the translatability of words and concepts across two languages 
does not accurately reflect the degree of similarity in the cultures represented by 
those two languages. If, for example, there are greater differences in the languages 
than in the cultures — as may perhaps be the case between the German and Hun- 
garian languages as compared with the German and Hungarian cultures — then 
again the method is subject to some bias. 

Aside from matters of methodology and possible sources of bias, a point that 
has been made earlier in this report deserves repeating: The comparison in this 
study was made between Puerto Rican and continental U.S. students; the re- 
sulting conversions between the paa and the sat apply only between these two 
groups of students. Whether the same conversions would also have been found 
had the study been conducted between the paa and the sat as taken by other 
Spanish speakers and other English speakers is an open question. Indeed, it is an 
open question whether the conversion obtained here also applies to variously de- 
fined subgroups of the Puerto Rican and continental populations — liberal arts 
women, engineering men, urban blacks, etc. 

It is also to be hoped that the conversions between the two types of tests will 
not be used without a clear recognition of the realities: A Puerto Rican student 



References ■ 31 

with a PAA-verbal score of 680 has a score ''equivalent" to an SAT-verbal score of 
568. This is not to say that that student could actually earn an SAT-verbal score 
of 568 were he to take the sat. He might do better, or he might do worse, depend- 
ing, obviously, on his facility in English. The conversions do offer a way of evalu- 
ating his general aptitude for verbal and mathematical materials in terms familiar 
to users of sat scores; and, depending on how well he can be expected to learn the 
English language, his likelihood of success in competition with native English 
speakers in the continental United States can be estimated. Continuing study of 
the comparative validity of the paa and the sat for predicting the performance of 
Puerto Rican students in mainland colleges is indispensable to the judicious use 
of these conversions. 
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TABLE I Linear Conversions of PAA Scaled Scores to SAT Scaled Scores 



vcbal Mathematical 



PAA-V 
Score 


Equivalent 
SAT-V 
Score 


PAA-V 
Score 


Equivalent 
SAT-V 
Score 


PAA-M 
Score 


Equivalent 
SAT-M 
Score 


PAA-M 
Score 


Equivalent 
SAT>M 
Score 


800 


683 


490 


346 


800 


667 


490 


312 






480 


335 






480 


301 


790 


672 


470 


324 


790 


655 


470 


289 


780 • 


662 


460 


314 


780 


644 


460 


278 


770 


651 


450 


303 


770 


633 


450 


267 


760 


640 


440 


292 


760 


621 


440 


255 


750 


629 


430 


281 


750 


610 


430 


244 


740 


618 


420 


270 


740 


598 


420 


232 


730 


607 


410 


259 


730 


587 


410 


221 


720 


596 


400 


248 


720 


575 


400 


209 


710 


585 






710 


564 






700 


575 


390 


237 


700 


553 


390 


. (198) 






380 


226 






380 


(186) 


690 


564 


370 


216 


690 


541 


370 


(175) 


680 


553 


360 


205 


680 


530 


360 


(164) 


670 


542 


350 


(194) 


670 


518 


350 


(152) 


660 


531 


340 


(183) 


660 


507 


340 


. (141) 


650 


520 


330 


(172) 


650 


495 


330 


(129) 


640 


509 


320 


(161) 


640 


484 


320 


(118) 


630 


498 


310 


(150) 


630 


472 


310 


(106) 


620 


488 


300 


(139) 


620 


461 


300 


(95) 


610 


477 






610 


450 






600 


466 


290 


(129) 


600 


438 


290 


(83) 






280 


(118) 






280 


(72) 


590 


455 


270 


(107) 


590 


427 


270 


(61) 


580 


444 


260 


(96) 


580 


415 


260 


(49) 


570 


433 


250 


(85) 


570 


404 


250 


(38) 


560 


422 


240 


(74) 


560 


392 


240 


(26) 


550 


411 


230 


(63) 


550 


381 


230 


(15) 


540 


401 • 


220 


(52) 


540 


369 


220 


(3) 


530 


390 


210 


(42) 


530 


358 


210 




520 


379 


200 


(31) 


520 


347 


200 




510 


368 






510 


335 






500 


357 






500 


324 







note: The lowest score reported on both the paa and the sat is the score of 200. 
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TABLE \ \ Curvilinear Conversions of PAA Scaled Scores to SAT Scaled Scores 



Verbal 



Mathematical 



PAA-V 
Score 

800 



790 
780 
770 
760 
750 
740 
730 
720 
710 
700 

690 
680 
670 
660 
650 
640 
630 
620 
610 
600 

590 
580 
570 
560 
550 
540 
530 
520 
510 
500 



Equivalent 
SAT-V PAA>V 
Score Score 

490 
480 
470 
460 
450 
440 
430 
420 

686 410 
666 400 
648 

628 390 
380 
37C 
360 
350 
340 
330 
320 
310 
300 



606 
583 
565 
541 
525 
503 
487 
475 
459 
449 

438 
424 
413 
402 
393 
383 
372 
363 
353 
342 



290 
280 
270 
260 
250 
240 
230 
220 
210 
200 



Equivalent 
SAT-V 
Score 

334 
324 
317 
r^o? 
299 
290 
282 
275 
268 
260 

252 
246 
240 
234 
227 
220 
?13 
207 
201 



PAA-M 
Score 

800 



790 
780 
770 
760 
750 
740 
730 
720 
710 
700 

690 
680 
670 
660 
650 
640 
630 
620 
610 
600 

590 
580 
570 
560 
550 
540 
530 
520 
510 
500 



Equivalent 
SAT-M 
Score 



660 
598 
570 
545 
519 

502 
482 
■467 
455 
439 
4S1 
418 
409 
396 
386 

378 

367 

3b t 

353 

344 

339 

331' 

326 

320 

313 



PAA-M 
Score 

490 
480 
470 
460 
450 
440 
430 
420 
410 
400 

390 
380 
370 
360 
350 
340 
330 
320 
310 
300 

290 
280 
270 
260 
250 
240 
230 
220 

200 



Equivalent 
SAT-M 
Score 

305 
301 
296 
292 
288 
283 
279 
272 
268 
266 

258 
255 
250 
245 
241 
236 
230 
224 
219 
215 

207 
202 
(198) 



NOTu: The lowest score reported on both the paa and tiie sat is the score of 200. 



TABLE 1 1 1 Final Conversions Between PAA Scaled Scores and SAT Scaled Scores 



Verbal 



Mathematical 



PAA-V 

Score 

800 



790 
780 
770 
760 
750 
740 
730 
720 
710 
700 



Equivalent 
SAT-V PAA-V 
Score Score 



767*' 

750" 

733" 

715'' 

598" 

680" 

663^' 

647 

631 

617 

602 



490 
480 
470 
460 
450 
440 
430 
420 
410 
400 



Equivalent 
SAT-V 
Score 



340 
330 
321 
311 
301 
291 
282 
273 
264 
254 



PAA-M 
Score 

800 



790 
780 
770 
760 
750 
740 
730 
720 
710 
700 



Equivalent 
SAT-M PAA-M 
Score Score 



_ h 

_ t> 
^ f' 
« f* 
650" 
629 
593 
5" 3 
555 
536 



490 
480 
470 
460 
450 
440 
430 
420 
410 
400 



Equivalent 
SAT-M 
Score 



309 
301 
293 
285 
278 
269 
262 
252 
245 
238 



690 
680 
670 
660 
650 
640 
630 
620 
610 
600 



585 
568 
554 
536 
523 
506 
493 
482 
468 
458 



390 
380 
370 
360 
350 
340 
330 
320 
310 
300 



245 
236 
228 
220 
211 
202 
(193) 
(184) 
(176) 



690 
680 
670 
660 
650 
640 
630 
620 
610 
600 



522 
506 
493 
481 
467 
458 
445 
435 
423 
412 



390 
380 
370 
360 
350 
340 
330 
320 
310 
300 



228 
221 
213 
205 
(197) 
(189) 
(180) 
(171) 
(163) 
(155) 



590 
580 
570 
560 
550 
540 
530 
520 
510 
500 



447 
434 
423 
412 
402 
392 
381 
371 
361 
350 



290 
280 
270 
260 
250 
240 
230 
220 
210 
200 



590 
580 
570 
560 
550 
540 
530 
520 
510 
500 



403 
391 
383 
373 
363 
354 
345 
337 
3LJ 
319 



290 
P.QO 
5i70 
260 
250 
240 
230 
220 
210 
200 



(145) 
(137) 



NO'vu*. The lowest score reported on both the paa and the sA r is the score of 200. For opera- 
tional use these conversions should be rounded to the nearest multiple of 10. 
" Extrapolated values. 
Further extrapolation was not possible in this region* 



